{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第2篇：DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第1部分：DataFrame简介"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame是一个带有索引的二维数据结构，每列可以有自己的名字，并且可以有不同的数据类型，类似于 Excel 、SQL 表，或 Series 对象构成的字典。你可以把它想象成一个excel表格或者数据库中的一张表，DataFrame是最常用的Pandas对象。  \n",
    "上一节我们引入了pandas的基本数据结构Series,它的基本属性包含index、value、name、dtype等，属于一维数据结构。DataFrame是pandas的第二大基本数据结构，可以看成由一个或多个Series构成的二维数据结构，其除了拥有和Series一样的index、value、name、dtype等属性外，最大的特点是还拥有column，即列或标签。从这个角度来讲，Series由于是一维数据结构，所有的数据可以看成有多行一列，可以直接通过行索引取到value，故其没有列名；而DataFrame属于二维数据结构，可以看成多行多列，直接通过行索引取到的是多个列对应的value，因而需要对每列标注名字来识别对应的value，因此有了column。\n",
    "![](https://zhangyafei-1258643511.cos.ap-nanjing.myqcloud.com/Python/blog/dataframe1.png)\n",
    "上图展示了DataFrame与Series之间的关系，我们可以发现二者的区别和联系：\n",
    "- 区别：\n",
    "> series，只是一个一维数据结构，它由index和value(最基本元素)组成。  \n",
    "> dataframe，是一个二维结构，除了拥有index和value之外，还拥有column。\n",
    "- 联系：\n",
    "> dataframe由多个series组成，无论是行还是列，单独拆分出来都是一个series。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第2部分：DataFrame的创建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame 是最常用的 Pandas 对象，与 Series 一样，DataFrame 支持多种类型的输入数据：\n",
    "- 方式1：字典套Series\n",
    "- 方式2：字典套列表\n",
    "- 方式3：列表套字典\n",
    "- 方式4：列表套列表（二维数组）\n",
    "- 方式5：DataFrame.from_dict\n",
    "- 方式6：DataFrame.from_record\n",
    "- 方式7：文件中读取"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 导入相关库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式1：字典套Series\n",
    "> data = {  \n",
    "    \"col1\": pd.Series([],index),  \n",
    "    \"col2\": pd.Series([],index),  \n",
    "    \"col3\": pd.Series([],index)  \n",
    " } \n",
    "\n",
    "**这种方式有以下特点：**\n",
    "- 字典的key为列名\n",
    "- 字典的value为Series\n",
    "- 若DataFrame没有指定index和columns，则以字典的key为列，Series的index为索引\n",
    "- 若DataFrame指定了index或columns，会根据指定的columns去字典中筛选对应的key，根据指定的index去Series中筛选对应的索引，并将筛选的结果放入到生成的DataFrame中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mary</td>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>James</td>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Yafei</td>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age city\n",
       "0    Tom   18   北京\n",
       "1    Bob   30   上海\n",
       "2   Mary   25   广州\n",
       "3  James   40   深圳\n",
       "4  Yafei   22   晋城"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"name\": pd.Series([\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"]),\n",
    "    \"age\": pd.Series([18, 30, 25, 40, 22]),\n",
    "    \"city\": pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"])\n",
    "}\n",
    "user_info = pd.DataFrame(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**设置name为索引**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 指定索引(index)\n",
    "index = [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"]\n",
    "data = {\n",
    "    \"age\": pd.Series([18, 30, 25, 40, 22], index=index),\n",
    "    \"city\": pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"],index=index)\n",
    "}\n",
    "user_info = pd.DataFrame(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**当Series指定索引，DataFrame中也指定索引时会发生什么呢？**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom     18   北京\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 指定索引(index)\n",
    "index = [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"]\n",
    "data = {\n",
    "    \"age\": pd.Series([18, 30, 25, 40, 22], index=index),\n",
    "    \"city\": pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"],index=index)\n",
    "}\n",
    "user_info = pd.DataFrame(data, index=[\"Tom\",\"James\", \"Yafei\"])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**当每列的Series和DataFrame都指定索引且不相同时，DataFrame会根据传递的索引去每个Series中寻找在指定索引范围的值，相当于根据索引对每一列Series进行筛选操作。**  \n",
    "请看下面这个示例，age列和city列为Series对象，但是没有指定index，这是index为默认值，即0-Series长度-1的数字索引，下例中的索引为0-4,。而创建DataFrame时指定的index为name列表，此时会去Series中寻找index属于name列表的元素添加到DataFrame中，发现age列和city列的Series的索引与name列表交集为空，故结果中name索引对应的age和city均为空值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom    NaN  NaN\n",
       "Bob    NaN  NaN\n",
       "Mary   NaN  NaN\n",
       "James  NaN  NaN\n",
       "Yafei  NaN  NaN"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"age\": pd.Series([18, 30, 25, 40, 22]),\n",
    "    \"city\": pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"])\n",
    "}\n",
    "user_info= pd.DataFrame(data, index=[\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=5, step=1)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['age'].index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Tom', 'Bob', 'Mary', 'James', 'Yafei'], dtype='object')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**可以看到当列索引和DataFrame的索引同时指定时，若不一致的情况下会发生冲突，此时生成的DataFrame会以指定的index为准。**  \n",
    "那么，我们如何将DataFrame的索引设置为name列呢？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age city\n",
       "0   18   北京\n",
       "1   30   上海\n",
       "2   25   广州\n",
       "3   40   深圳\n",
       "4   22   晋城"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"age\": pd.Series([18, 30, 25, 40, 22]),\n",
    "    \"city\": pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"])\n",
    "}\n",
    "user_info = pd.DataFrame(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.index = [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"]\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面说了索引的设置时候的注意点，那么columns也是一样的，当传入的data指定的列名与columns不一致时，生成的DataFrame的列会以指定的columns为准，并去data中去查找与columns存在交集的列，并将其放入到DataFrame中。也可以理解为起到了筛选data中列数据的作用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>hobby</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>北京</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>上海</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>广州</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>深圳</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>晋城</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  city hobby\n",
       "0   北京   NaN\n",
       "1   上海   NaN\n",
       "2   广州   NaN\n",
       "3   深圳   NaN\n",
       "4   晋城   NaN"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"age\": pd.Series([18, 30, 25, 40, 22]),\n",
    "    \"city\": pd.Series([\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"])\n",
    "}\n",
    "user_info = pd.DataFrame(data, columns=['city', 'hobby'])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式2：字典套列表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mary</td>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>James</td>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Yafei</td>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age city\n",
       "0    Tom   18   北京\n",
       "1    Bob   30   上海\n",
       "2   Mary   25   广州\n",
       "3  James   40   深圳\n",
       "4  Yafei   22   晋城"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 不指定index，index默认为数字，从0开始\n",
    "data = {\n",
    "    \"name\": [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"],\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"]\n",
    "}\n",
    "user_info = pd.DataFrame(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 指定索引(index)\n",
    "data = {\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"]\n",
    "}\n",
    "index = [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"]\n",
    "user_info = pd.DataFrame(data, index=index)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式3：列表套字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mary</td>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>James</td>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age city\n",
       "0    Tom   18   北京\n",
       "1    Bob   30   上海\n",
       "2   Mary   25   广州\n",
       "3  James   40   深圳"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [\n",
    "    {'name':'Tom','age':18,'city': \"北京\"},\n",
    "    {'name':'Bob','age': 30, 'city': '上海'},\n",
    "    {'name':'Mary','age': 25, 'city': '广州'},\n",
    "    {'name':'James','age': 40, 'city': '深圳'}\n",
    "]\n",
    "user_info = pd.DataFrame(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.set_index(keys='name')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式4：二维数组(列表套列表)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mary</td>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>James</td>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age city\n",
       "0    Tom   18   北京\n",
       "1    Bob   30   上海\n",
       "2   Mary   25   广州\n",
       "3  James   40   深圳"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [['Tom',18, \"北京\"],\n",
    "        ['Bob', 30, \"上海\"],\n",
    "        ['Mary', 25, \"广州\"],\n",
    "        ['James', 40, \"深圳\"]]\n",
    "columns = ['name', 'age', 'city']\n",
    "user_info = pd.DataFrame(data=data, columns=columns)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.set_index(keys='name')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [[18, \"北京\"],\n",
    "        [30, \"上海\"],\n",
    "        [25, \"广州\"],\n",
    "        [40, \"深圳\"]]\n",
    "index = ['Tom', 'Bob', 'Mary', 'James']\n",
    "columns = ['age', 'city']\n",
    "user_info = pd.DataFrame(data=data, index=index, columns=columns)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式5：from_dict\n",
    "DataFrame from_dict（）方法用于将Dict转换为DataFrame对象。 此方法接受以下参数。\n",
    "- data ：字典或类似数组的对象来创建DataFrame。\n",
    "- orient ：数据的方向。 允许值为（“columns”，“index”），默认值为“columns”。\n",
    "- columns ：当方向为“索引”时，用作DataFrame标签的值的列表。 如果与列方向一起使用，则会引发ValueError 。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当没有指定orient时，默认将key值作为列名。（列排列）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mary</td>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>James</td>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Yafei</td>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age city\n",
       "0    Tom   18   北京\n",
       "1    Bob   30   上海\n",
       "2   Mary   25   广州\n",
       "3  James   40   深圳\n",
       "4  Yafei   22   晋城"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"name\": [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"],\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"]\n",
    "}\n",
    "user_info = pd.DataFrame.from_dict(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当指定orient=‘index’时，将key值作为行名。（行排列）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Bob</td>\n",
       "      <td>Mary</td>\n",
       "      <td>James</td>\n",
       "      <td>Yafei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <td>18</td>\n",
       "      <td>30</td>\n",
       "      <td>25</td>\n",
       "      <td>40</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <td>北京</td>\n",
       "      <td>上海</td>\n",
       "      <td>广州</td>\n",
       "      <td>深圳</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        0    1     2      3      4\n",
       "name  Tom  Bob  Mary  James  Yafei\n",
       "age    18   30    25     40     22\n",
       "city   北京   上海    广州     深圳     晋城"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info = pd.DataFrame.from_dict(data, orient='index')\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "自定义列名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>学生1</th>\n",
       "      <th>学生2</th>\n",
       "      <th>学生3</th>\n",
       "      <th>学生4</th>\n",
       "      <th>学生5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Bob</td>\n",
       "      <td>Mary</td>\n",
       "      <td>James</td>\n",
       "      <td>Yafei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <td>18</td>\n",
       "      <td>30</td>\n",
       "      <td>25</td>\n",
       "      <td>40</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <td>北京</td>\n",
       "      <td>上海</td>\n",
       "      <td>广州</td>\n",
       "      <td>深圳</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      学生1  学生2   学生3    学生4    学生5\n",
       "name  Tom  Bob  Mary  James  Yafei\n",
       "age    18   30    25     40     22\n",
       "city   北京   上海    广州     深圳     晋城"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info = pd.DataFrame.from_dict(data, orient='index', columns=['学生1', '学生2', '学生3', '学生4', '学生5'])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以发现，from_dict可以将字典类型的数据转换为DataFrame对象，而之前用DataFrame的构造函数也可以直接实现这一功能，但是，此方法没有选择使用基于索引的方向。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mary</td>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>James</td>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Yafei</td>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age city\n",
       "0    Tom   18   北京\n",
       "1    Bob   30   上海\n",
       "2   Mary   25   广州\n",
       "3  James   40   深圳\n",
       "4  Yafei   22   晋城"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"name\": [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"],\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"]\n",
    "}\n",
    "user_info = pd.DataFrame(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因此，当您想要索引方向时，请使用from_dict（）方法。 对于默认方案，最好使用DataFrame构造函数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式6：from_record\n",
    "DataFrame.from_records()方法用于将结构化的数据或数组转换为DataFrame对象。 此方法接受以下参数：\n",
    "- data: 结构化的数据数组，元组或dict序列，或DataFrame结构化的输入数据。\n",
    "- index: 索引\n",
    "- exclude : 序列，要排除的列或索引\n",
    "- columns : 序列，列\n",
    "- coerce_float : bool, default False，强制转换为浮点型\n",
    "- nrows : int, default None,如果数据是迭代器，则要读取的行数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>Tom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>Bob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>Mary</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>Yafei</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age city   name\n",
       "0   18   北京    Tom\n",
       "1   30   上海    Bob\n",
       "2   25   广州   Mary\n",
       "3   40   深圳  James\n",
       "4   22   晋城  Yafei"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    \"name\": [\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"],\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"]\n",
    "}\n",
    "user_info = pd.DataFrame.from_records(data)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>北京</td>\n",
       "      <td>Tom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>上海</td>\n",
       "      <td>Bob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>广州</td>\n",
       "      <td>Mary</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>深圳</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>晋城</td>\n",
       "      <td>Yafei</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  city   name\n",
       "0   北京    Tom\n",
       "1   上海    Bob\n",
       "2   广州   Mary\n",
       "3   深圳  James\n",
       "4   晋城  Yafei"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info = pd.DataFrame.from_records(data, exclude=['age'])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [[18, \"北京\"],\n",
    "        [30, \"上海\"],\n",
    "        [25, \"广州\"],\n",
    "        [40, \"深圳\"]]\n",
    "index = ['Tom', 'Bob', 'Mary', 'James']\n",
    "user_info = pd.DataFrame.from_records(data, index=index, columns=['age', 'city'])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 方式7：从文件中读取\n",
    "数据分析过程中经常需要进行读写操作，Pandas实现了很多IO 操作的API，下面列举几个常见的常见的读取方法，其返回的数据类型均为DataFrame：\n",
    "- read_csv: 读取csv文件\n",
    "- read_excel: 读取excel文件\n",
    "- read_html: 读取html文件\n",
    "- read_sql: 读取sql数据\n",
    "- read_json: 读取json数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>姓名</th>\n",
       "      <th>课程</th>\n",
       "      <th>学期</th>\n",
       "      <th>成绩</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>1</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>2</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>3</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>大学英语</td>\n",
       "      <td>4</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>王大伟</td>\n",
       "      <td>高等数学</td>\n",
       "      <td>1</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    姓名    课程  学期  成绩\n",
       "0  王大伟  大学英语   1  92\n",
       "1  王大伟  大学英语   2  85\n",
       "2  王大伟  大学英语   3  83\n",
       "3  王大伟  大学英语   4  90\n",
       "4  王大伟  高等数学   1  91"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel('data/scores.xlsx')\n",
    "df.head()  # 查看前5行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>姓名</th>\n",
       "      <th>课程</th>\n",
       "      <th>学期</th>\n",
       "      <th>成绩</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>张明</td>\n",
       "      <td>高等数学</td>\n",
       "      <td>4</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>张明</td>\n",
       "      <td>大学体育</td>\n",
       "      <td>1</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>张明</td>\n",
       "      <td>大学体育</td>\n",
       "      <td>2</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>张明</td>\n",
       "      <td>大学体育</td>\n",
       "      <td>3</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>张明</td>\n",
       "      <td>大学体育</td>\n",
       "      <td>4</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    姓名    课程  学期  成绩\n",
       "31  张明  高等数学   4  86\n",
       "32  张明  大学体育   1  87\n",
       "33  张明  大学体育   2  85\n",
       "34  张明  大学体育   3  86\n",
       "35  张明  大学体育   4  92"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关于文件读取与写入相关知识，之后将会在文件IO专题部分讲解。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 总结\n",
    "关于DataFrame的创建，在pandas中为我们提供了多种方法，虽然方法较多，但是这也有另一个弊端，就是不知道选哪种，如何选的问题。我个人的使用经验来看，一般是**列表套字典**或**字典套列表**的方式采用DataFrame构造函数直接创建，当然，涉及文件读取的话，各种读取函数实际应用场景也相当多，这一部分就在之后介绍。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第3部分：DataFrame的属性\n",
    "基本属性如下：\n",
    "- df.shape   : 查看形状\n",
    "- df.columns  : 查看列名\n",
    "- df.index    : 查看索引\n",
    "- df.dtype   : 查看列数据类型\n",
    "- df.ndim    : 查看数据维度\n",
    "- df.T      : 转置  \n",
    "- df.values   : 将DataFrane转为数组类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex\n",
       "name               \n",
       "Tom     18   北京   男\n",
       "Bob     30   上海   男\n",
       "Mary    25   广州   女\n",
       "James   40   深圳   男\n",
       "Yafei   22   晋城   男"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 指定索引(index)\n",
    "data = {\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"],\n",
    "    'sex': ['男', '男', '女', '男', '男']\n",
    "}\n",
    "index = pd.Index([\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"], name='name')  # pd.Index将列表转换为具有轴标签的索引列表\n",
    "user_info = pd.DataFrame(data, index=index)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5, 3)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看形状，即多少行多少列\n",
    "user_info.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['age', 'city', 'sex'], dtype='object')"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看列名\n",
    "user_info.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Tom', 'Bob', 'Mary', 'James', 'Yafei'], dtype='object', name='name')"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看索引\n",
    "user_info.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age      int64\n",
       "city    object\n",
       "sex     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看各列数据类型\n",
    "user_info.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看维度\n",
    "user_info.ndim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>name</th>\n",
       "      <th>Tom</th>\n",
       "      <th>Bob</th>\n",
       "      <th>Mary</th>\n",
       "      <th>James</th>\n",
       "      <th>Yafei</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <td>18</td>\n",
       "      <td>30</td>\n",
       "      <td>25</td>\n",
       "      <td>40</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <td>北京</td>\n",
       "      <td>上海</td>\n",
       "      <td>广州</td>\n",
       "      <td>深圳</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sex</th>\n",
       "      <td>男</td>\n",
       "      <td>男</td>\n",
       "      <td>女</td>\n",
       "      <td>男</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "name Tom Bob Mary James Yafei\n",
       "age   18  30   25    40    22\n",
       "city  北京  上海   广州    深圳    晋城\n",
       "sex    男   男    女     男     男"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[18, '北京', '男'],\n",
       "       [30, '上海', '男'],\n",
       "       [25, '广州', '女'],\n",
       "       [40, '深圳', '男'],\n",
       "       [22, '晋城', '男']], dtype=object)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将DataFrame数据转为数组\n",
    "user_info.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第4部分：DataFrame的方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 基本方法\n",
    "- df.head(n): 查看前n行数据，默认为5\n",
    "- df.tail(n): 查看后n行数据，默认为5\n",
    "- df.info(): 查看整体情况\n",
    "- df.select_dtypes(include=['float64'])   # 选择特定类型的列  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city sex\n",
       "name              \n",
       "Tom    18   北京   男\n",
       "Bob    30   上海   男\n",
       "Mary   25   广州   女"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex\n",
       "name               \n",
       "Mary    25   广州   女\n",
       "James   40   深圳   男\n",
       "Yafei   22   晋城   男"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.tail(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 5 entries, Tom to Yafei\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype \n",
      "---  ------  --------------  ----- \n",
      " 0   age     5 non-null      int64 \n",
      " 1   city    5 non-null      object\n",
      " 2   sex     5 non-null      object\n",
      "dtypes: int64(1), object(2)\n",
      "memory usage: 320.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "user_info.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age\n",
       "name      \n",
       "Tom     18\n",
       "Bob     30\n",
       "Mary    25\n",
       "James   40\n",
       "Yafei   22"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.select_dtypes(include='int64')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 描述与统计\n",
    "整体描述与统计\n",
    ">df.describe() #查看数字类型的列整体概况  \n",
    " df.describe(include=['object']) #查看非数字类型的列的整体情况  \n",
    " df.groupby(col1)[col2].count()      \n",
    "\n",
    "单列描述与统计：通过df[col]选择该列数据，数据类型为Series，然后可以调用上一节介绍的Series的方法\n",
    "> df[col].value_counts() #统计某列中每个值出现的次数，相当于分组   \n",
    " df[col].sum(): 对某列数据求和  \n",
    " ...  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>27.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>8.485281</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>18.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>22.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>25.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>30.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>40.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             age\n",
       "count   5.000000\n",
       "mean   27.000000\n",
       "std     8.485281\n",
       "min    18.000000\n",
       "25%    22.000000\n",
       "50%    25.000000\n",
       "75%    30.000000\n",
       "max    40.000000"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       city sex\n",
       "count     5   5\n",
       "unique    5   2\n",
       "top      上海   男\n",
       "freq      1   4"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.describe(include=['object'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sex\n",
       "女    1\n",
       "男    4\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对年龄按照性别进行分组统计\n",
    "user_info.groupby('sex')['age'].count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分组与统计我将会在之后的专题章节中进行讲解，这里只需要初步了解"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      男\n",
       "Bob      男\n",
       "Mary     女\n",
       "James    男\n",
       "Yafei    男\n",
       "Name: sex, dtype: object"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 选择某列数据\n",
    "sex = user_info['sex']\n",
    "sex"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对该列数据进行统计"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "男    4\n",
       "女    1\n",
       "Name: sex, dtype: int64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对sex进行频次统计\n",
    "sex.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "135"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对age列进行求和\n",
    "user_info['age'].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "关于Series的其他统计方法请查找上一节内容"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 离散化\n",
    "- cut(x,bins, right: bool = True,labels=None,\n",
    "    retbins: bool = False,\n",
    "    precision: int = 3,\n",
    "    include_lowest: bool = False,\n",
    "    duplicates: str = \"raise\",\n",
    "    ordered: bool = True,\n",
    ")  \n",
    "\n",
    "- qcut(\n",
    "    x,\n",
    "    q,\n",
    "    labels=None,\n",
    "    retbins: bool = False,\n",
    "    precision: int = 3,\n",
    "    duplicates: str = \"raise\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      (17.978, 25.333]\n",
       "Bob      (25.333, 32.667]\n",
       "Mary     (17.978, 25.333]\n",
       "James      (32.667, 40.0]\n",
       "Yafei    (17.978, 25.333]\n",
       "Name: age, dtype: category\n",
       "Categories (3, interval[float64]): [(17.978, 25.333] < (25.333, 32.667] < (32.667, 40.0]]"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(user_info.age, 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom       (1, 18]\n",
       "Bob      (18, 30]\n",
       "Mary     (18, 30]\n",
       "James    (30, 50]\n",
       "Yafei    (18, 30]\n",
       "Name: age, dtype: category\n",
       "Categories (3, interval[int64]): [(1, 18] < (18, 30] < (30, 50]]"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(user_info.age, [1, 18, 30, 50])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      childhood\n",
       "Bob          youth\n",
       "Mary         youth\n",
       "James       middle\n",
       "Yafei        youth\n",
       "Name: age, dtype: category\n",
       "Categories (3, object): ['childhood' < 'youth' < 'middle']"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(user_info.age, [1, 18, 30, 50], labels=[\"childhood\", \"youth\", \"middle\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "cut 进行离散化之外，qcut 也可以实现离散化。cut 是根据每个值的大小来进行离散化的，qcut 是根据每个值出现的次数来进行离散化的。b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      (17.999, 23.0]\n",
       "Bob      (28.333, 40.0]\n",
       "Mary     (23.0, 28.333]\n",
       "James    (28.333, 40.0]\n",
       "Yafei    (17.999, 23.0]\n",
       "Name: age, dtype: category\n",
       "Categories (3, interval[float64]): [(17.999, 23.0] < (23.0, 28.333] < (28.333, 40.0]]"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.qcut(user_info.age, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 排序\n",
    "在进行数据分析时，少不了进行数据排序。Pandas 支持两种排序方式：按轴（索引或列）排序和按实际值排序。\n",
    "- sort_index 方法是按照标签（行标签或列标签）进行排序\n",
    "- sort_values 方法是按照某一列或多列的元素值进行排序\n",
    "\n",
    "可能你已经发现了，这两个方法Series中也有，功能也极其相似，不同的是一个排序对象是Series，一个排序对象是DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex\n",
       "name               \n",
       "Bob     30   上海   男\n",
       "James   40   深圳   男\n",
       "Mary    25   广州   女\n",
       "Tom     18   北京   男\n",
       "Yafei   22   晋城   男"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sex</th>\n",
       "      <th>city</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>男</td>\n",
       "      <td>北京</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>男</td>\n",
       "      <td>上海</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>女</td>\n",
       "      <td>广州</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>男</td>\n",
       "      <td>深圳</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>男</td>\n",
       "      <td>晋城</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      sex city  age\n",
       "name               \n",
       "Tom     男   北京   18\n",
       "Bob     男   上海   30\n",
       "Mary    女   广州   25\n",
       "James   男   深圳   40\n",
       "Yafei   男   晋城   22"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sort_index(axis=1, ascending=False) # 按照列进行倒序排"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex\n",
       "name               \n",
       "Tom     18   北京   男\n",
       "Yafei   22   晋城   男\n",
       "Mary    25   广州   女\n",
       "Bob     30   上海   男\n",
       "James   40   深圳   男"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 按照age列从小到大来排序\n",
    "user_info.sort_values(by=\"age\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex\n",
       "name               \n",
       "Tom     18   北京   男\n",
       "Yafei   22   晋城   男\n",
       "Mary    25   广州   女\n",
       "Bob     30   上海   男\n",
       "James   40   深圳   男"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 按照age和sex进行排序，优先排age，当age相同时，按sex排\n",
    "user_info.sort_values(by=[\"age\", \"sex\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一般在排序后，我们可能需要获取最大的n个值或最小值的n个值，我们可以使用 nlargest和\n",
    "nsmallest 方法来完成，这比先进行排序，再使用 head(n) 方法快得多。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "James    40\n",
       "Bob      30\n",
       "Mary     25\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info['age'].nlargest(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      18\n",
       "Yafei    22\n",
       "Mary     25\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info['age'].nsmallest(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 类型操作\n",
    "如果想要获取每种类型的列数的话，可以使用 df.dtypes 方法。  \n",
    "- 如果想要转换数据类型的话，可以通过 astype 来完成。\n",
    "- 有时候会涉及到将 object 类型转为其他类型，常见的有转为数字、日期、时间差，Pandas 中分别对应 to_numeric、to_datetime、to_timedelta 方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里给这些用户都添加一些关于分数的信息，输入时以文本格式输入。现在将分数这一列转为数字，很明显，100分 并非数字，为了强制转换，我们可以传入 errors 参数，这个参数的作用是当强转失败时的处理方式。默认情况下，errors='raise'，这意味着强转失败后直接抛出异常，设置 errors='coerce' 可以在强转失败时将有问题的元素赋值为 pd.NaT（对于datetime和timedelta）或 np.nan（数字），设置 errors='ignore' 可以在强转失败时返回原有的数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age      int64\n",
       "city    object\n",
       "sex     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "object    2\n",
       "int64     1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.dtypes.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      18.0\n",
       "Bob      30.0\n",
       "Mary     25.0\n",
       "James    40.0\n",
       "Yafei    22.0\n",
       "Name: age, dtype: float64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info['age'].astype(float)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     18   北京   男    98\n",
       "Bob     30   上海   男    78\n",
       "Mary    25   广州   女    68\n",
       "James   40   深圳   男    88\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info[\"score\"] = [\"98\", \"78\", \"68\", \"88\", \"100分\"]\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age       int64\n",
       "city     object\n",
       "sex      object\n",
       "score    object\n",
       "dtype: object"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pd.to_numeric(user_info.score, errors=\"raise\")  # 如果有值转换不成功会报错\n",
    "\n",
    "# ---------------------------------------------------------------------------\n",
    "# ValueError                                Traceback (most recent call last)\n",
    "# pandas\\_libs\\lib.pyx in pandas._libs.lib.maybe_convert_numeric()\n",
    "\n",
    "# ValueError: Unable to parse string \"100分\"\n",
    "\n",
    "# During handling of the above exception, another exception occurred:\n",
    "\n",
    "# ValueError                                Traceback (most recent call last)\n",
    "# <ipython-input-60-ff1e33582f2f> in <module>\n",
    "# ----> 1 pd.to_numeric(user_info.score, errors=\"raise\")\n",
    "\n",
    "# F:\\Anaconda3\\lib\\site-packages\\pandas\\core\\tools\\numeric.py in to_numeric(arg, errors, downcast)\n",
    "#     150         coerce_numeric = errors not in (\"ignore\", \"raise\")\n",
    "#     151         try:\n",
    "# --> 152             values = lib.maybe_convert_numeric(\n",
    "#     153                 values, set(), coerce_numeric=coerce_numeric\n",
    "#     154             )\n",
    "\n",
    "# pandas\\_libs\\lib.pyx in pandas._libs.lib.maybe_convert_numeric()\n",
    "\n",
    "# ValueError: Unable to parse string \"100分\" at position 4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      98.0\n",
       "Bob      78.0\n",
       "Mary     68.0\n",
       "James    88.0\n",
       "Yafei     NaN\n",
       "Name: score, dtype: float64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.to_numeric(user_info.score, errors=\"coerce\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom        98\n",
       "Bob        78\n",
       "Mary       68\n",
       "James      88\n",
       "Yafei    100分\n",
       "Name: score, dtype: object"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.to_numeric(user_info.score, errors=\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 索引行函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     18   北京   男    98\n",
       "Bob     30   上海   男    78\n",
       "Mary    25   广州   女    68\n",
       "James   40   深圳   男    88\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### where\n",
    "> where(cond,other=np.nan,inplace=False,axis=None,level=None,errors=\"raise\",try_cast=False,)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "默认不满足条件的行全部被设置为NaN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city  sex score\n",
       "name                       \n",
       "Tom     NaN  NaN  NaN   NaN\n",
       "Bob    30.0   上海    男    78\n",
       "Mary    NaN  NaN  NaN   NaN\n",
       "James  40.0   深圳    男    88\n",
       "Yafei   NaN  NaN  NaN   NaN"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.where(user_info['age'] > 25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过这种方法筛选结果和[]操作符的结果完全一致："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30.0</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40.0</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city sex score\n",
       "name                      \n",
       "Bob    30.0   上海   男    78\n",
       "James  40.0   深圳   男    88"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.where(user_info['age'] > 25).dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Bob     30   上海   男    78\n",
       "James   40   深圳   男    88"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info[user_info['age'] > 25]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一个参数为布尔条件，第二个参数为填充值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     25   25  25    25\n",
       "Bob     30   上海   男    78\n",
       "Mary    25   25  25    25\n",
       "James   40   深圳   男    88\n",
       "Yafei   25   25  25    25"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.where(user_info['age'] > 25, 25) # age小于25全部设置为25"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### mask\n",
    "> mask( cond,other=np.nan,inplace=False,axis=None,level=None,errors=\"raise\",try_cast=False)\n",
    "\n",
    "mask函数与where功能上相反，其余完全一致，即对条件为True的单元进行填充。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18.0</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25.0</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22.0</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age city  sex score\n",
       "name                       \n",
       "Tom    18.0   北京    男    98\n",
       "Bob     NaN  NaN  NaN   NaN\n",
       "Mary   25.0   广州    女    68\n",
       "James   NaN  NaN  NaN   NaN\n",
       "Yafei  22.0   晋城    男  100分"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.mask(user_info['age'] > 25) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     18   北京   男    98\n",
       "Bob     25   25  25    25\n",
       "Mary    25   广州   女    68\n",
       "James   25   25  25    25\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.mask(user_info['age'] > 25, 25) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### query\n",
    "> query(self, expr, inplace=False, **kwargs)\n",
    "\n",
    "query函数中的布尔表达式中，下面的符号都是合法的：行列索引名、字符串、and/not/or/&/|/~/not in/in/==/!=、四则运算符"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Bob     30   上海   男    78\n",
       "Mary    25   广州   女    68\n",
       "James   40   深圳   男    88\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.query('age > 20')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Bob     30   上海   男    78\n",
       "James   40   深圳   男    88\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.query('(age > 20)&(sex == \"男\")')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 重复元素处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### duplicated方法\n",
    "> duplicated(subset, keep='first')\n",
    "\n",
    "- subset: 判断重复的列，默认为全部列,单列为字符串或列表类型，多列为列表类型\n",
    "- keep: 重复的行标注为True的方式，first,last和False.False是所有重复元素均为True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      False\n",
       "Bob       True\n",
       "Mary     False\n",
       "James     True\n",
       "Yafei     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.duplicated(subset='sex')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可选参数keep默认为first，即首次出现设为不重复，若为last，则最后一次设为不重复，若为False，则所有重复项为True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom       True\n",
       "Bob       True\n",
       "Mary     False\n",
       "James     True\n",
       "Yafei    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.duplicated(subset='sex', keep='last')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom       True\n",
       "Bob       True\n",
       "Mary     False\n",
       "James     True\n",
       "Yafei     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.duplicated(subset='sex', keep=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### drop_duplicates方法\n",
    "> drop_duplicates(subset: ,keep=\"first\",inplace=False,ignore_index=False) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除重复项，默认为只保留所有重复元素的第一个，其余都删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city sex score\n",
       "name                    \n",
       "Tom    18   北京   男    98\n",
       "Mary   25   广州   女    68"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.drop_duplicates(subset=['sex'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "保留重复元素的最后一个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Mary    25   广州   女    68\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.drop_duplicates(subset=['sex'], keep='last')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在传入多列时等价于将多列共同视作一个多级索引，比较重复项："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     18   北京   男    98\n",
       "Bob     30   上海   男    78\n",
       "Mary    25   广州   女    68\n",
       "James   40   深圳   男    88\n",
       "Yafei   22   晋城   男  100分"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.drop_duplicates(subset=['sex', 'age'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 抽样函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### sample\n",
    "> sample(n=None, frac=None,replace=False,weights=None,random_state=None,axis=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "n为样本量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Yafei   22   晋城   男  100分\n",
       "Mary    25   广州   女    68\n",
       "Tom     18   北京   男    98"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sample(n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "frac为抽样比"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "James   40   深圳   男    88\n",
       "Mary    25   广州   女    68"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sample(frac=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "replace为是否放回"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city sex score\n",
       "name                    \n",
       "Mary   25   广州   女    68\n",
       "Tom    18   北京   男    98\n",
       "Mary   25   广州   女    68\n",
       "Tom    18   北京   男    98\n",
       "Bob    30   上海   男    78"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sample(n=5, replace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "axis为抽样维度，默认为0，即抽行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>100分</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     18   北京   男    98\n",
       "James   40   深圳   男    88\n",
       "Yafei   22   晋城   男  100分\n",
       "Bob     30   上海   男    78\n",
       "Mary    25   广州   女    68"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sample(n=5, axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sex</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>男</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>男</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>女</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>男</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>男</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      sex city\n",
       "name          \n",
       "Tom     男   北京\n",
       "Bob     男   上海\n",
       "Mary    女   广州\n",
       "James   男   深圳\n",
       "Yafei   男   晋城"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sample(n=2, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "weights为样本权重，自动归一化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city sex score\n",
       "name                    \n",
       "Bob    30   上海   男    78\n",
       "Tom    18   北京   男    98\n",
       "Mary   25   广州   女    68"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.sample(n=3,weights=np.random.rand(user_info.shape[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex score\n",
       "name                     \n",
       "Tom     18   北京   男    98\n",
       "James   40   深圳   男    88\n",
       "Bob     30   上海   男    78"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 以某一列为权重，这在抽样理论中很常见\n",
    "# 抽到的概率与Math数值成正比\n",
    "user_info.sample(n=3,weights=user_info['age'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 函数应用\n",
    "虽说 Pandas 为我们提供了非常丰富的函数，有时候我们可能需要自己定制一些函数，并将它应用到 DataFrame 或 Series。常用到的函数有：map、apply、applymap。\n",
    "\n",
    "- map 是 Series 中特有的方法，通过它可以对 Series 中的每个元素实现转换。如果我想通过年龄判断用户是否属于中年人（30岁以上为中年），通过 map 可以轻松搞定它。我想要通过城市来判断是南方还是北方，我可以这样操作。\n",
    "- apply 方法既支持 Series，也支持 DataFrame，在对 Series 操作时会作用到每个值上，在对 DataFrame 操作时会作用到所有行或所有列（通过 axis 参数控制）。# 对 Series 来说，apply 方法 与 map 方法区别不大。\n",
    "- applymap 方法针对于 DataFrame，它作用于 DataFrame 中的每个元素，它对 DataFrame 的效果类似于 apply 对 Series 的效果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom       no\n",
       "Bob      yes\n",
       "Mary      no\n",
       "James    yes\n",
       "Yafei     no\n",
       "Name: age, dtype: object"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过年龄判断用户是否属于中年人（30岁以上为中年）\n",
    "user_info.age.map(lambda x: \"yes\" if x >= 30 else \"no\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      north\n",
       "Bob      south\n",
       "Mary     south\n",
       "James    south\n",
       "Yafei    north\n",
       "Name: city, dtype: object"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过城市来判断是南方还是北方\n",
    "city_map = {\n",
    "    \"北京\": \"north\",\n",
    "    \"上海\": \"south\",\n",
    "    \"广州\": \"south\",\n",
    "    \"深圳\": \"south\",\n",
    "    \"晋城\": \"north\",\n",
    "}\n",
    "user_info.city.map(city_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age      40\n",
       "city     深圳\n",
       "sex       男\n",
       "score    98\n",
       "dtype: object"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 求每一列的最大值\n",
    "user_info.apply(lambda x: x.max(), axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      19\n",
       "Bob      31\n",
       "Mary     26\n",
       "James    41\n",
       "Yafei    23\n",
       "dtype: int64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 求每一行的age加1\n",
    "user_info.apply(lambda x: x.age + 1, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18applymap</td>\n",
       "      <td>北京applymap</td>\n",
       "      <td>男applymap</td>\n",
       "      <td>98applymap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30applymap</td>\n",
       "      <td>上海applymap</td>\n",
       "      <td>男applymap</td>\n",
       "      <td>78applymap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25applymap</td>\n",
       "      <td>广州applymap</td>\n",
       "      <td>女applymap</td>\n",
       "      <td>68applymap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40applymap</td>\n",
       "      <td>深圳applymap</td>\n",
       "      <td>男applymap</td>\n",
       "      <td>88applymap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22applymap</td>\n",
       "      <td>晋城applymap</td>\n",
       "      <td>男applymap</td>\n",
       "      <td>100分applymap</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              age        city        sex         score\n",
       "name                                                  \n",
       "Tom    18applymap  北京applymap  男applymap    98applymap\n",
       "Bob    30applymap  上海applymap  男applymap    78applymap\n",
       "Mary   25applymap  广州applymap  女applymap    68applymap\n",
       "James  40applymap  深圳applymap  男applymap    88applymap\n",
       "Yafei  22applymap  晋城applymap  男applymap  100分applymap"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将每个值转化为小写字符串\n",
    "user_info.applymap(lambda x: str(x) + 'applymap')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第5部分：DataFrame的增删改查"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 数据准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 指定索引(index)\n",
    "data = {\n",
    "    \"age\": [18, 30, 25, 40, 22],\n",
    "    \"city\": [\"北京\", \"上海\", \"广州\", \"深圳\", \"晋城\"]\n",
    "}\n",
    "index = pd.Index([\"Tom\", \"Bob\", \"Mary\", \"James\", \"Yafei\"], name='name')  # pd.Index将列表转换为具有轴标签的索引列表\n",
    "user_info = pd.DataFrame(data, index=index)\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame 就像带索引的 Series 字典，提取、设置、删除列的操作与字典类似："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 索引/查找\n",
    "\n",
    "**从查找方法角度分为三种：**\n",
    "- 按标签/索引名/列名）查找\n",
    "- 按序号/位置查找\n",
    "- 按布尔条件查找  \n",
    "  注：& | ~  且 或 非 多个布尔条件必须加小括号\n",
    "- 按数据类型查找\n",
    "   \n",
    "\n",
    "**从语法角度分为5种：**\n",
    "- df.col: 按列名选择列（不推荐）,若该列不存在会报错\n",
    "- df.get(col, default): 按列名查找指定列数据，若不存在不错误，可以设置默认值\n",
    "- df[]\n",
    "   - [col]：按列名查找单列\n",
    "   - [[col1. col2, ...]]: 按列名查找多列\n",
    "   - [start: end]: 按序号进行切片查找\n",
    "   - [bool表达式]：按布尔条件进行查找     \n",
    "\n",
    "- df.loc[行标签，列标签]（推荐）\n",
    "   - 行和列可以为行索引名或列名，表示查找单行或单列\n",
    "   - 行和列可以为行索引列表或列名列表，表示查找多行或多列\n",
    "   - 行和列可以为布尔表达式，根据布尔条件进行行和列的查找\n",
    "- df.iloc[行序号，列序号]\n",
    "   - 行和列可以为行序号或列序号，表示查找单行或单列\n",
    "   - 行和列可以为行序号列表或列序号列表，表示查找多行或多列\n",
    "   - 行和列可以为行序号切片或列序号切片，表示查找连续多行或连续多列\n",
    "- df.select_dtypes(include=None, exclude=None): 按数据类型查找指定列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查找指定列数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**按列名查找**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按列名查找单列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      18\n",
       "Bob      30\n",
       "Mary     25\n",
       "James    40\n",
       "Yafei    22\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info['age']  # 单列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      18\n",
       "Bob      30\n",
       "Mary     25\n",
       "James    40\n",
       "Yafei    22\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.age"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      18\n",
       "Bob      30\n",
       "Mary     25\n",
       "James    40\n",
       "Yafei    22\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.get('age')  # get方式获取指定列数据，若列不存在不会报错， 类似于字典的get"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'18'"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.get('age2', '18')  # 可以设置默认值，若指定列不存在，则返回默认值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      18\n",
       "Bob      30\n",
       "Mary     25\n",
       "James    40\n",
       "Yafei    22\n",
       "Name: age, dtype: int64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.loc[:, 'age']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按列名查找多列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info[['age', 'city']]  # 多列\n",
    "# user_info.loc[:, ['age', 'city']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**按列序号查找**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Tom      北京\n",
       "Bob      上海\n",
       "Mary     广州\n",
       "James    深圳\n",
       "Yafei    晋城\n",
       "Name: city, dtype: object"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.iloc[:, 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.iloc[:, [0, 1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按数据类型查找"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age\n",
       "name      \n",
       "Tom     18\n",
       "Bob     30\n",
       "Mary    25\n",
       "James   40\n",
       "Yafei   22"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.select_dtypes(include='int64')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查找指定行数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据行标签/索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age     18\n",
       "city    北京\n",
       "Name: Tom, dtype: object"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.loc['Tom']  # 单行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Tom    18   北京\n",
       "Bob    30   上海"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.loc[['Tom', 'Bob']]  # 多行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据行序号"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "age     18\n",
       "city    北京\n",
       "Name: Tom, dtype: object"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.iloc[0]  # 第一行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Tom    18   北京\n",
       "Bob    30   上海"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.iloc[[0,1]]  # 前两行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Tom    18   北京\n",
       "Bob    30   上海"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.iloc[0:2]  # 使用序号切片获取前两行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Tom    18   北京"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info[0:1] # 第0行 使用切片 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Tom    18   北京\n",
       "Bob    30   上海"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info[0:2]  # 前两行  使用切片"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "布尔查找"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Bob     30   上海\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找年龄大于等于30的数据\n",
    "user_info[user_info.age >= 30]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Bob    30   上海"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找年龄大于等于30且city为上海的数据\n",
    "user_info[(user_info.age >= 30) & (user_info.city == '上海')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找city为晋城或者深圳的数据\n",
    "user_info[(user_info.city == '晋城') | (user_info.city == '深圳')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还记得我们将Series时的isin方法吗？会对Series中每一个值进行成员运算，返回结果为布尔值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找city为晋城或者深圳的数据\n",
    "user_info[user_info.city.isin(['晋城','深圳'])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找age不等于20，city不是北京的数据\n",
    "user_info[(user_info.age != 20) & (user_info.city != '北京')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找age不等于20，city不是北京的数据\n",
    "user_info[~((user_info.age == 20) | (user_info.city == '北京'))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 查找指定行指定列数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按标签查找"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按标签查找指定行指定列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "18"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找行标签为Tom的age列\n",
    "user_info.loc['Tom', 'age']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按标签查找指定多行指定多列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找行标签为Tom和James的age和city列\n",
    "user_info.loc[['Tom', 'James'], ['age', 'city']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按布尔条件查找指定行指定列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name\n",
       "Bob      上海\n",
       "James    深圳\n",
       "Name: city, dtype: object"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找年龄大于等于30的人的city\n",
    "user_info.loc[user_info.age >= 30, 'city']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按序号查找"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按序号查询指定行指定列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "18"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找第1行第1列数据\n",
    "user_info.iloc[0, 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按序号查找指定多行指定多列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "James   40   深圳"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查找第1和第4行，第1和第2列数据\n",
    "user_info.iloc[[0, 3], [0, 1]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按序号切片查找指定多行指定多列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      age city\n",
       "name          \n",
       "Tom    18   北京\n",
       "Bob    30   上海\n",
       "Mary   25   广州"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查询前3行前2列数据\n",
    "user_info.iloc[0:3, 0:2]\n",
    "# user_info.iloc[:3, :2]  # 两种写法一样"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 添加\n",
    "涉及到的相关方法如下：\n",
    "- df[new_col] = data, data为类列表数据，添加列new_col\n",
    "- df.append(other): other为DataFrame、Series或类dict,或此类型的list，在df末尾添加新行（Series和dict为单行，DataFrame或list是多行）\n",
    "- df.insert(loc, column, value): 在制定位置插入新列\n",
    "- df.loc[index， col] = data，添加行或新列(推荐)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city\n",
       "name           \n",
       "Tom     18   北京\n",
       "Bob     30   上海\n",
       "Mary    25   广州\n",
       "James   40   深圳\n",
       "Yafei   22   晋城"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 添加新列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age city sex\n",
       "name               \n",
       "Tom     18   北京   男\n",
       "Bob     30   上海   男\n",
       "Mary    25   广州   女\n",
       "James   40   深圳   男\n",
       "Yafei   22   晋城   男"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info['sex'] = ['男', '男', '女', '男', '男']\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在指定位置插入新列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age hobby city sex\n",
       "name                     \n",
       "Tom     18   羽毛球   北京   男\n",
       "Bob     30    棒球   上海   男\n",
       "Mary    25    舞蹈   广州   女\n",
       "James   40    看球   深圳   男\n",
       "Yafei   22    跑步   晋城   男"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.insert(1, 'hobby', ['羽毛球', '棒球', '舞蹈', '看球', '跑步'])\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame 提供了 assign() 方法，可以利用现有的列创建新列。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "      <th>desc</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>北京- 男-羽毛球</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>上海- 男-棒球</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>广州- 女-舞蹈</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>深圳- 男-看球</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>晋城- 男-跑步</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age hobby city sex       desc\n",
       "name                                \n",
       "Tom     18   羽毛球   北京   男  北京- 男-羽毛球\n",
       "Bob     30    棒球   上海   男   上海- 男-棒球\n",
       "Mary    25    舞蹈   广州   女   广州- 女-舞蹈\n",
       "James   40    看球   深圳   男   深圳- 男-看球\n",
       "Yafei   22    跑步   晋城   男   晋城- 男-跑步"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.assign(desc=user_info['city'] + '- '+ user_info['sex'] + '-' + user_info['hobby'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 添加新行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最常用-推荐"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Linda</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age hobby city sex\n",
       "name                     \n",
       "Tom     18   羽毛球   北京   男\n",
       "Bob     30    棒球   上海   男\n",
       "Mary    25    舞蹈   广州   女\n",
       "James   40    看球   深圳   男\n",
       "Yafei   22    跑步   晋城   男\n",
       "Linda   28    长治  乒乓球   女"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.loc['Linda'] = [28, '乒乓球', '长治','女']\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用append方式在DataFrame末尾添加添加新行  \n",
    "> append(other, ignore_index=False, verify_integrity=False, sort=False) \n",
    ">  - other: DataFrame、Series或类dict型对象，或这种对象的list\n",
    ">  - ignore_index: bool,default False, 如果为True,新生成的DataFrame的行标签将会被标记为0,1,2...n-1\n",
    ">  - verify_integrity : bool, default False, 如果为True，添加时会判断索引是否重复，如果之前添加过该索引数据，则raise ValueError\n",
    ">  - sort : bool, default False，如果为True,如果两个对象的列没对齐，则进行排序\n",
    "\n",
    "注意：append不会改变自身DataFrame，会将添加之后的数据生成一个新的DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Linda</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lisa</th>\n",
       "      <td>23</td>\n",
       "      <td>火锅</td>\n",
       "      <td>重庆</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age hobby city sex\n",
       "name                     \n",
       "Tom     18   羽毛球   北京   男\n",
       "Bob     30    棒球   上海   男\n",
       "Mary    25    舞蹈   广州   女\n",
       "James   40    看球   深圳   男\n",
       "Yafei   22    跑步   晋城   男\n",
       "Linda   28    长治  乒乓球   女\n",
       "Lisa    23    火锅   重庆   女"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user1 = pd.Series(data=[23, '重庆', '火锅', '女'], index=['age', 'city', 'hobby', 'sex'], name='Lisa')\n",
    "user_info.append(user1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "插入字典对象，ignore_index必须为True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>23</td>\n",
       "      <td>火锅</td>\n",
       "      <td>重庆</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age hobby city sex\n",
       "0   18   羽毛球   北京   男\n",
       "1   30    棒球   上海   男\n",
       "2   25    舞蹈   广州   女\n",
       "3   40    看球   深圳   男\n",
       "4   22    跑步   晋城   男\n",
       "5   28    长治  乒乓球   女\n",
       "6   23    火锅   重庆   女"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user2 = {'age': 23, 'city': '重庆', 'hobby': '火锅', 'sex': '女'}\n",
    "user_info.append(user2, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "append也可以添加DataFrame对象，即实现两个DataFrame数据合并，关于这部分内容，我们将在之后合并与连接这一章节进行讲解"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 修改\n",
    "涉及的相关方法如下：\n",
    "- df.columns = columns: columns为列表， 设置新列名\n",
    "- df.index = index, index为列表, 设置新索引\n",
    "- df.rename(index=None, columns=None, inplace=False): index修改索引，columns修改列名，inplace是否修改原始数据\n",
    "- df.loc[index， col] = value: 根据行或列标签修改指定数据\n",
    "- df.iloc[iindex, icol] = value: 根据行位置和列位置修改指定数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Linda</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age hobby city sex\n",
       "name                     \n",
       "Tom     18   羽毛球   北京   男\n",
       "Bob     30    棒球   上海   男\n",
       "Mary    25    舞蹈   广州   女\n",
       "James   40    看球   深圳   男\n",
       "Yafei   22    跑步   晋城   男\n",
       "Linda   28    长治  乒乓球   女"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 修改列名和索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "方式1：rename方式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      年龄   爱好   城市 性别\n",
       "name                 \n",
       "汤姆    18  羽毛球   北京  男\n",
       "鲍勃    30   棒球   上海  男\n",
       "麦瑞    25   舞蹈   广州  女\n",
       "詹姆斯   40   看球   深圳  男\n",
       "亚飞    22   跑步   晋城  男\n",
       "琳达    28   长治  乒乓球  女"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "columns = {'age': '年龄', 'hobby': '爱好', 'city': '城市', 'sex': '性别'}\n",
    "index = {'Tom': '汤姆', 'Bob': '鲍勃', 'Mary': '麦瑞', 'James': '詹姆斯', 'Yafei': '亚飞', 'Linda': '琳达'}\n",
    "user_info.rename(columns=columns, index=index)\n",
    "# 如果想修改原数据可以设置inplace=True\n",
    "# user_info.rename(columns=columns, index=index, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>hobby</th>\n",
       "      <th>city</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bob</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mary</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>James</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yafei</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Linda</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       age hobby city sex\n",
       "name                     \n",
       "Tom     18   羽毛球   北京   男\n",
       "Bob     30    棒球   上海   男\n",
       "Mary    25    舞蹈   广州   女\n",
       "James   40    看球   深圳   男\n",
       "Yafei   22    跑步   晋城   男\n",
       "Linda   28    长治  乒乓球   女"
      ]
     },
     "execution_count": 128,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "方式2：直接修改原数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>30</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>25</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>40</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>22</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>28</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "汤姆   18  羽毛球   北京  男\n",
       "鲍勃   30   棒球   上海  男\n",
       "麦瑞   25   舞蹈   广州  女\n",
       "詹姆斯  40   看球   深圳  男\n",
       "亚飞   22   跑步   晋城  男\n",
       "琳达   28   长治  乒乓球  女"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.columns = ['年龄', '爱好', '城市', '性别']\n",
    "user_info.index = ['汤姆', '鲍勃', '麦瑞', '詹姆斯', '亚飞', '琳达']\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 修改指定列数据\n",
    "此列存在为修改，若不存在则为添加新列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>19</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "汤姆   19  羽毛球   北京  男\n",
       "鲍勃   31   棒球   上海  男\n",
       "麦瑞   26   舞蹈   广州  女\n",
       "詹姆斯  41   看球   深圳  男\n",
       "亚飞   23   跑步   晋城  男\n",
       "琳达   29   长治  乒乓球  女"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info['年龄'] = user_info['年龄'] + 1\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 修改指定行数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>天津</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄  爱好   城市 性别\n",
       "汤姆   18  排球   天津  男\n",
       "鲍勃   31  棒球   上海  男\n",
       "麦瑞   26  舞蹈   广州  女\n",
       "詹姆斯  41  看球   深圳  男\n",
       "亚飞   23  跑步   晋城  男\n",
       "琳达   29  长治  乒乓球  女"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.loc['汤姆'] = [18, '排球', '天津', '男'] \n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 修改指定行指定列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按标签修改"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>棒球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄  爱好   城市 性别\n",
       "汤姆   18  排球   北京  男\n",
       "鲍勃   31  棒球   上海  男\n",
       "麦瑞   26  舞蹈   广州  女\n",
       "詹姆斯  41  看球   深圳  男\n",
       "亚飞   23  跑步   晋城  男\n",
       "琳达   29  长治  乒乓球  女"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 修改Tom的城市为北京\n",
    "user_info.loc['汤姆', '城市'] = '北京'\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按序号位置修改"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "汤姆   18   排球   北京  男\n",
       "鲍勃   31  羽毛球   上海  男\n",
       "麦瑞   26   舞蹈   广州  女\n",
       "詹姆斯  41   看球   深圳  男\n",
       "亚飞   23   跑步   晋城  男\n",
       "琳达   29   长治  乒乓球  女"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 这种方式使用的前提是了解指定位置的行或列的含义\n",
    "# 修改第2行第2列的元素值为羽毛球\n",
    "user_info.iloc[1, 1] = '羽毛球'\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 删除\n",
    "- df.drop(index=None, columns=None, inplace=False)：index,根据索引删除行，columns，根据列名删除列，inplace是否修改原始数据，返回删除的行或列\n",
    "- df.pop(col): 删除指定列，返回删除列\n",
    "- del df[col]: 删除指定列，无返回值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 删除指定列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "方式1：drop方式,可以选择是否修改原数据，有返回值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好 性别\n",
       "汤姆   18   排球  男\n",
       "鲍勃   31  羽毛球  男\n",
       "麦瑞   26   舞蹈  女\n",
       "詹姆斯  41   看球  男\n",
       "亚飞   23   跑步  男\n",
       "琳达   29   长治  女"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 删除城市列，不修改原数据\n",
    "user_info.drop(columns=['城市'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "若修改原数据，可以设置inplace参数为True  \n",
    "> user_info.drop(columns=['城市'], inplace=True) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "方式2：del方式，修改原数据，无返回值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别  score\n",
       "汤姆   18   排球   北京  男     33\n",
       "鲍勃   31  羽毛球   上海  男     89\n",
       "麦瑞   26   舞蹈   广州  女      2\n",
       "詹姆斯  41   看球   深圳  男     15\n",
       "亚飞   23   跑步   晋城  男     39\n",
       "琳达   29   长治  乒乓球  女     93"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加score列\n",
    "import random\n",
    "score = [random.randint(0, 100) for _ in range(user_info.shape[0])]\n",
    "user_info['score'] = score\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "汤姆   18   排球   北京  男\n",
       "鲍勃   31  羽毛球   上海  男\n",
       "麦瑞   26   舞蹈   广州  女\n",
       "詹姆斯  41   看球   深圳  男\n",
       "亚飞   23   跑步   晋城  男\n",
       "琳达   29   长治  乒乓球  女"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "del user_info['score']\n",
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "方式3：pop，修改原数据，有返回值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别  score\n",
       "汤姆   18   排球   北京  男     33\n",
       "鲍勃   31  羽毛球   上海  男     89\n",
       "麦瑞   26   舞蹈   广州  女      2\n",
       "詹姆斯  41   看球   深圳  男     15\n",
       "亚飞   23   跑步   晋城  男     39\n",
       "琳达   29   长治  乒乓球  女     93"
      ]
     },
     "execution_count": 137,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 添加score列\n",
    "user_info['score'] = score\n",
    "user_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "汤姆     33\n",
       "鲍勃     89\n",
       "麦瑞      2\n",
       "詹姆斯    15\n",
       "亚飞     39\n",
       "琳达     93\n",
       "Name: score, dtype: int64"
      ]
     },
     "execution_count": 138,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score_data = user_info.pop('score')\n",
    "score_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "汤姆   18   排球   北京  男\n",
       "鲍勃   31  羽毛球   上海  男\n",
       "麦瑞   26   舞蹈   广州  女\n",
       "詹姆斯  41   看球   深圳  男\n",
       "亚飞   23   跑步   晋城  男\n",
       "琳达   29   长治  乒乓球  女"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 删除指定行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "drop方式，可以设置是否修改原数据，有返回值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "鲍勃   31  羽毛球   上海  男\n",
       "麦瑞   26   舞蹈   广州  女\n",
       "詹姆斯  41   看球   深圳  男\n",
       "亚飞   23   跑步   晋城  男\n",
       "琳达   29   长治  乒乓球  女"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info.drop(index=['汤姆'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>年龄</th>\n",
       "      <th>爱好</th>\n",
       "      <th>城市</th>\n",
       "      <th>性别</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>汤姆</th>\n",
       "      <td>18</td>\n",
       "      <td>排球</td>\n",
       "      <td>北京</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>鲍勃</th>\n",
       "      <td>31</td>\n",
       "      <td>羽毛球</td>\n",
       "      <td>上海</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>麦瑞</th>\n",
       "      <td>26</td>\n",
       "      <td>舞蹈</td>\n",
       "      <td>广州</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>詹姆斯</th>\n",
       "      <td>41</td>\n",
       "      <td>看球</td>\n",
       "      <td>深圳</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>亚飞</th>\n",
       "      <td>23</td>\n",
       "      <td>跑步</td>\n",
       "      <td>晋城</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>琳达</th>\n",
       "      <td>29</td>\n",
       "      <td>长治</td>\n",
       "      <td>乒乓球</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     年龄   爱好   城市 性别\n",
       "汤姆   18   排球   北京  男\n",
       "鲍勃   31  羽毛球   上海  男\n",
       "麦瑞   26   舞蹈   广州  女\n",
       "詹姆斯  41   看球   深圳  男\n",
       "亚飞   23   跑步   晋城  男\n",
       "琳达   29   长治  乒乓球  女"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第6部分：DataFrame的遍历"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在实际数据处理的过程中，我们经常需要对某行或这某列的元素进行一些操作，这时就需要对元素进行遍历。那么pandas中如何对元素进行遍历呢?\n",
    "\n",
    "![](https://img2018.cnblogs.com/blog/1476293/201903/1476293-20190312223939625-1790211193.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 遍历列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "年龄\n",
      "爱好\n",
      "城市\n",
      "性别\n"
     ]
    }
   ],
   "source": [
    "for col in user_info:\n",
    "    print(col)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "年龄\n",
      "汤姆     18\n",
      "鲍勃     31\n",
      "麦瑞     26\n",
      "詹姆斯    41\n",
      "亚飞     23\n",
      "琳达     29\n",
      "Name: 年龄, dtype: int64\n",
      "爱好\n",
      "汤姆      排球\n",
      "鲍勃     羽毛球\n",
      "麦瑞      舞蹈\n",
      "詹姆斯     看球\n",
      "亚飞      跑步\n",
      "琳达      长治\n",
      "Name: 爱好, dtype: object\n",
      "城市\n",
      "汤姆      北京\n",
      "鲍勃      上海\n",
      "麦瑞      广州\n",
      "詹姆斯     深圳\n",
      "亚飞      晋城\n",
      "琳达     乒乓球\n",
      "Name: 城市, dtype: object\n",
      "性别\n",
      "汤姆     男\n",
      "鲍勃     男\n",
      "麦瑞     女\n",
      "詹姆斯    男\n",
      "亚飞     男\n",
      "琳达     女\n",
      "Name: 性别, dtype: object\n"
     ]
    }
   ],
   "source": [
    "for col in user_info:\n",
    "    print(col)\n",
    "    print(user_info[col])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "汤姆     19\n",
       "鲍勃     32\n",
       "麦瑞     27\n",
       "詹姆斯    42\n",
       "亚飞     24\n",
       "琳达     30\n",
       "Name: 年龄, dtype: int64"
      ]
     },
     "execution_count": 144,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 遍历单列\n",
    "user_info['年龄'].apply(lambda x: x + 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "年龄 18 31 26 41 23 29\n",
      "爱好 排球 羽毛球 舞蹈 看球 跑步 长治\n",
      "城市 北京 上海 广州 深圳 晋城 乒乓球\n",
      "性别 男 男 女 男 男 女\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "年龄    None\n",
       "爱好    None\n",
       "城市    None\n",
       "性别    None\n",
       "dtype: object"
      ]
     },
     "execution_count": 145,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 遍历所有列\n",
    "def desc(row):\n",
    "    print(row.name, row['汤姆'], row['鲍勃'], row['麦瑞'], row['詹姆斯'], row['亚飞'], row['琳达'])\n",
    "\n",
    "user_info.apply(desc, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 遍历行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "汤姆 18 排球 北京 男\n",
      "鲍勃 31 羽毛球 上海 男\n",
      "麦瑞 26 舞蹈 广州 女\n",
      "詹姆斯 41 看球 深圳 男\n",
      "亚飞 23 跑步 晋城 男\n",
      "琳达 29 长治 乒乓球 女\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "汤姆     None\n",
       "鲍勃     None\n",
       "麦瑞     None\n",
       "詹姆斯    None\n",
       "亚飞     None\n",
       "琳达     None\n",
       "dtype: object"
      ]
     },
     "execution_count": 146,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def desc(row):\n",
    "    print(row.name, row['年龄'], row['爱好'], row['城市'], row['性别'])\n",
    "\n",
    "user_info.apply(desc, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "汤姆 18 排球 北京 男\n",
      "鲍勃 31 羽毛球 上海 男\n",
      "麦瑞 26 舞蹈 广州 女\n",
      "詹姆斯 41 看球 深圳 男\n",
      "亚飞 23 跑步 晋城 男\n",
      "琳达 29 长治 乒乓球 女\n"
     ]
    }
   ],
   "source": [
    "for index, row in user_info.iterrows():\n",
    "    print(index,row['年龄'], row['爱好'], row['城市'], row['性别'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "汤姆 18 排球 北京 男\n",
      "鲍勃 31 羽毛球 上海 男\n",
      "麦瑞 26 舞蹈 广州 女\n",
      "詹姆斯 41 看球 深圳 男\n",
      "亚飞 23 跑步 晋城 男\n",
      "琳达 29 长治 乒乓球 女\n"
     ]
    }
   ],
   "source": [
    "for index in user_info.index:\n",
    "    print(index, user_info.loc[index]['年龄'],user_info.loc[index]['爱好'], user_info.loc[index]['城市'],user_info.loc[index]['性别'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "18 排球 北京 男\n",
      "31 羽毛球 上海 男\n",
      "26 舞蹈 广州 女\n",
      "41 看球 深圳 男\n",
      "23 跑步 晋城 男\n",
      "29 长治 乒乓球 女\n"
     ]
    }
   ],
   "source": [
    "for row in user_info.values:\n",
    "    print(row[0], row[1], row[2], row[3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 简单小结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本文介绍了pandas的第二大数据结构DataFrame的相关知识点。包含创建、属性、方法、以及增删改查和遍历。DataFrame是pandas数据的核心，所有的方法基本都是围绕Series和DataFrame展开的，本文介绍的当然不是全部，后面的章节会慢慢补充。当然，通过本文对于DataFrame的学习，相信你已经对DataFrame有了基本的了解，并掌握了其基本用法，别急，好戏才刚刚开始！"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "320px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
